Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Point cloud is an important type of geometric data structure for many embedded applications such as autonomous driving and augmented reality. Current Point Cloud Networks (PCNs) have proven to achieve great success in using inference to perform point cloud analysis, including object part segmentation, shape classification, and so on. However, point cloud applications on the computing edge require more than just the inference step. They require an end-to-end (E2E) processing of the point cloud workloads: pre-processing of raw data, input preparation, and inference to perform point cloud analysis. Current PCN approaches to support end-to-end processing of point cloud workload cannot meet the real-time latency requirement on the edge, i.e., the ability of the AI service to keep up with the speed of raw data generation by 3D sensors. Latency for end-to-end processing of the point cloud workloads stems from two reasons: memory-intensive down-sampling in the pre-processing phase and the data structuring step for input preparation in the inference phase. In this paper, we present HgPCN, an end-to-end heterogeneous architecture for real-time embedded point cloud applications. In HgPCN, we introduce two novel methodologies based on spatial indexing to address the two identified bottlenecks. In the Pre-processing Engine of HgPCN, an Octree-Indexed-Sampling method is used to optimize the memory-intensive down-sampling bottleneck of the pre-processing phase. In the Inference Engine, HgPCN extends a commercial DLA with a customized Data Structuring Unit which is based on a Voxel-Expanded Gathering method to fundamentally reduce the workload of the data structuring step in the inference phase. The initial prototype of HgPCN has been implemented on an Intel PAC (Xeon+FPGA) platform. Four commonly available point cloud datasets were used for comparison, running on three baseline devices: Intel Xeon W-2255, Nvidia Xavier NX Jetson GPU, and Nvidia 4060ti GPU. These point cloud datasets were also run on two existing PCN accelerators for comparison: PointACC and Mesorasi. Our results show that for the inference phase, depending on the dataset size, HgPCN achieves speedup from 1.3× to 10.2× vs. PointACC, 2.2× to 16.5× vs. Mesorasi, and 6.4× to 21× vs. Jetson NX GPU. Along with optimization of the memory-intensive down-sampling bottleneck in pre-processing phase, the overall latency shows that HgPCN can reach the real-time requirement by providing end-to-end service with keeping up with the raw data generation rate.more » « lessFree, publicly-accessible full text available November 2, 2025
-
Description / Abstract: In order to effectively provide INaaS (Inference-as-a-Service) for AI applications in resource-limited cloud environments, two major challenges must be overcome: achieving low latency and providing multi-tenancy. This paper presents EIF (Efficient INaaS Framework), which uses a heterogeneous CPU-FPGA architecture to provide three methods to address these challenges (1) spatial multiplexing via software-hardware co-design virtualization techniques, (2) temporal multiplexing that exploits the sparsity of neural-net models, and (3) streaming-mode inference which overlaps data transfer and computation. The prototype EIF is implemented on an Intel PAC (shared-memory CPU-FPGA) platform. For evaluation, 12 types of DNN models were used as benchmarks, with different size and sparsity. Based on these experiments, we show that in EIF, the temporal multiplexing technique can improve the user density of an AI Accelerator Unit from 2$$\times$$ to 6$$\times$$, with marginal performance degradation. In the prototype system, the spatial multiplexing technique supports eight AI Accelerators Unit on one FPGA. By using a streaming mode based on a Mediated Pass-Through architecture, EIF can overcome the FPGA on-chip memory limitation to improve multi-tenancy and optimize the latency of INaaS. To further enhance INaaS, EIF utilizes the MapReduce function to provide a more flexible QoS. Together with the temporal/spatial multiplexing techniques, EIF can support 48 users simultaneously on a single FPGA board in our prototype system. In all tested benchmarks, cold-start latency accounts for only approximately 5\% of the total response time.more » « less
-
Abstract Though consistently shown to detect mammographically occult cancers, breast ultrasound has been noted to have high false-positive rates. In this work, we present an AI system that achieves radiologist-level accuracy in identifying breast cancer in ultrasound images. Developed on 288,767 exams, consisting of 5,442,907 B-mode and Color Doppler images, the AI achieves an area under the receiver operating characteristic curve (AUROC) of 0.976 on a test set consisting of 44,755 exams. In a retrospective reader study, the AI achieves a higher AUROC than the average of ten board-certified breast radiologists (AUROC: 0.962 AI, 0.924 ± 0.02 radiologists). With the help of the AI, radiologists decrease their false positive rates by 37.3% and reduce requested biopsies by 27.8%, while maintaining the same level of sensitivity. This highlights the potential of AI in improving the accuracy, consistency, and efficiency of breast ultrasound diagnosis.more » « less
An official website of the United States government
